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Abstract — This paper addresses the problem of simultaneous 
signal recovery and dictionary learning based on compressive 
measurements. Multiple signals are analyzed jointly, with multi- 
ple sensing matrices, under the assumption that the unknown 
signals come from a union of a small number of disjoint 
subspaces. This problem is important, for instance, in image 
inpainting applications, in which the multiple signals are con- 
stituted by (incomplete) image patches taken from the overall 
image. This work extends standard dictionary learning and 
block-sparse dictionary optimization, by considering compressive 
measurements (e.g., incomplete data). Previous work on blind 
compressed sensing is also generalized by using multiple sensing 
matrices and relaxing some of the restrictions on the learned 
dictionary. Drawing on results developed in the context of 
matrix completion, it is proven that both the dictionary and 
signals can be recovered with high probability from compressed 
measurements. The solution is unique up to block permutations 
and invertible linear transformations of the dictionary atoms. 
The recovery is contingent on the number of measurements per 
signal and the number of signals being sufficiently large; bounds 
are derived for these quantities. In addition, this paper presents 
a computationally practical algorithm that performs dictionary 
learning and signal recovery, and establishes conditions for its 
convergence to a local optimum. Experimental results for image 
inpainting demonstrate the capabilities of the method. 



I. Introduction 

The problem of learning a dictionary for a set of signals has 
received considerable attention in recent years. This problem 
is known to be ill-posed in general, unless constraints are 
imposed on the dictionary and signals. One such constraint is 
the assumption that the signals x.^ e M", i = 1, . . . ,N, can be 
sparsely represented under an unknown dictionary, i.e., each 
vector Xi can be written as 



= Ds, 



(1) 



where D e E"^'' is the dictionary, and g M*" are sparse 
coefficient vectors satisfying ||sj||o <C r. The residual energy 
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II e,; II 2 is assumed to be small. For example, |[TJ derives condi- 
tions to ensure the uniqueness of D and the representations s^, 
for the case in which all the coordinates of the signals x, are 
observed. In this setting, the problem is known as dictionary 
learning (DL) or, sometimes, collaborative DL to emphasize 
the fact that multiple signals are considered. Note that, in DL 
and in most related literature, "uniqueness" is defined up to an 
equivalence class involving permutations and rotations of the 
dictionary atoms; we follow the same convention throughout 
this paper In the DL setting, several algorithms have been 
proposed for estimating D and s^. Examples includes K-SVD 
|1| and the method of optimal directions (MOD) |181, both of 
which enjoy local convergence properties. More recently, 1 16[ 
has derived local convergence guarantees for £i minimization 
applied to DL when the signals are sparse. 

A more structured type of sparsity is considered in block- 
sparse dictionary optimization (BSDO) p6| , in which it is 
assumed that the nonzero coordinates of Sj (active atoms) oc- 
cur in blocks rather than in arbitrary locations. This property is 
called block sparsity | [T2] , and is important for the analysis of 
signals that belong to a union of a small number of subspaces, 
as described in |13 |. The standard BSDO framework, like DL, 
assumes that all coordinates of the signals x^ are observed. The 
block structure, i.e., the number of atoms and composition 
of each block of the dictionary, is in general not known a 
priori and should be estimated as part of the DL process. The 
BSDO algorithm reduces to K-SVD when the block size is 
one (standard sparsity). 

While, technically, the set of all fc-sparse signals in W 
is itself a union of subspaces, it greatly simplifies the 
problem if one considers the more structured setting of block 
sparsity. This reduces the number of subspaces to (^) , where 
L is the total number of blocks in the dictionary and K is 
the number of blocks that are active (block sparsity reduces 
to standard sparsity when all blocks are singletons). Block 
sparsity is closely related to the group LASSO in statistics 
literature | |20[ , and also to the mixture of factor analyzers 
(MFA) statistical model, as noted in (|8). The particular case for 
which only one block is active is called one-block sparsity and 
corresponds to a union of (^) = L possible subspaces, which 
is a dramatic simplification compared to the unstructured spar- 
sity case. While this might at first sound limiting, the authors 
in pO) have obtained state-of-the-art image restoration results 
with a one-block sparsity model, and a variety of alternative 
methods have been proposed for this special case |14|, p9) , 

ioi. 

In appUcations we often do not have access to data x^, but 
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only to compressive measurements 



(2) 



where A e j^mx" has m < n rows, so that e M™. In 
order to enable recovery of from even when D is known, 
the sensing matrix A must satisfy incoherence properties with 
respect to D, as prescribed by compressed sensing (CS) theory 
||6j, Learning D from compressive measurements is called 
blind compressed sensing (blind CS), and does not admit a 
unique solution unless additional structure is imposed on D 
p"7|. See also the approach in [TOl] for simultaneous dictionary 
learning and sensing matrix design in the CS scenario. 

In this paper, we address simultaneous estimation of one- 
block-sparse signals and the corresponding dictionary, given 
only compressive measurements. This unifies blind CS and 
BSDO (for the one-block-sparse case in particular). It is 
known that the standard blind CS problem, where a fixed 
sensing matrix A is used, does not admit a unique solution 
in general, although a number of special cases of dictionaries 
for which such a solution exists have been identified in flT) . 
These special cases are: (i) finite sets of bases (i.e., it is 
known that the dictionary is one member of a finite set of 
known bases for M" ); (ii) sparse dictionaries (i.e., the atoms 
of the dictionary themselves admit sparse representations) and 
(iii) block-diagonal dictionaries, with each block composed of 
orthogonal columns. 

We are motivated in part by the problem of inpainting 
and interpolating an image Q, pT[ , where one observes an 
incomplete image, i.e., we only know the intensity values 
at a subset of pixel locations (or a subset of their linear 
combinations). Additionally, the image is processed in (often 
overlapping) patches, which we convert to n-dimensional 
vectors (our signals of interest). We observe rrii < n pixels 
in each patch, indexed by i, with these nii pixels selected at 
random. Therefore, the locations of the missing pixels are in 
general (at least partially) different for each patch. This means 
that, unlike the classical blind CS setting, the sensing matrix 
is not the same for all signals; we denote the sensing matrix 
for patch i as A; e M™^^". The assumption of multiple and 
distinct A^ is crucial for solving the problem of interest, as it 
enables the use of existing results from matrix completion |24|. 
Moreover, it has been demonstrated in |29 | that using multiple 
Ai improves inpainting and interpolation performance. 

In conventional image inpainting, each A^ consists of a 
randomly chosen subset of rrii rows of the n x n identity 
matrix (thereby selecting rrii pixels). Successful estimation of 
the missing pixel intensity values is contingent on each patch 
being representable in terms of a small subset of the columns 
in the dictionary D |3l( . In other words, there needs to exist 
some D such that the are (at least approximately) sparse. 
However, our analysis is not restricted to A, being defined 
as in conventional inpainting. Other constructions typically 
used in CS, such as matrices with i.i.d. random-subgaussian 
p7) entries, or with rows drawn uniformly at random from an 
orthobasis f6l, can be employed. Moreover, in many settings 
it is appropriate to assume that the patches belong to a union 



underlying our theoretical and computational results. This 
assumption, coupled with the use of multiple A^, allows us to 
ensure recovery of D and the under milder conditions on 
D and on the number of vectors y^, as compared to standard 
blind CS. 

We show that unique recovery of the dictionary and the one- 
block-sparse signals is guaranteed with high probability, albeit 
with high computational effort, if the number of measurements 
per signal and the number of signals are sufficiently large. 
We derive algorithm-independent bounds for these quantities, 
thereby extending DL by considering compressive measure- 
ments and establishing a connection with matrix-completion 
theory. Our results reduce to those known for DL when the 
signals are fully observed. 

Additionally, we present a computationally feasible algo- 
rithm that performs DL and recovery of block-sparse signals 
based on compressive measurements with multiple sensing 
matrices. We automatically learn the block structure, in the 
same way as the BSDO algorithm p6] ; the size (number of 
atoms) and composition of each block is not known a priori 
and we only need to specify a maximum block size. Our 
algorithm is closely related to BSDO, the main differences 
being that one-block sparsity and compressive measurements 
are considered. The estimates of D, and the block structure 
are found by alternating least-squares minimization. 

It is shown that the algorithm converges to a local optimum 
under mild conditions, which we derive. This convergence 
analysis is not available for most other one-block-sparsity 
promoting methods, such as fSl, or for most methods that 
rely on standard sparsity, such as |31|. An exception is the 
analysis in | [T6| pertaining to £i minimization, where local 
convergence is proven when the number of measurements is 
at least 0{n^k) for signals with sparsity level k. However, 
compressive measurements are not considered. Besides our 
global uniqueness result, our method provably attains local 
convergence for, at most, 0{nk \ogn) measurements, although 
our one-block sparsity assumption is stronger than in p6) . 
The approach proposed in f30l also converges, but involves an 
even stronger assumption akin to intra-block sparsity, and does 
not include an algorithm-independent uniqueness analysis. 
Compelling experimental results are presented below, demon- 
strating the ability of our algorithm to perform inpainting of 
real images with a significant fraction of missing pixels. 

The remainder of the paper is organized as follows. Section 
[n] presents preliminary definitions and a formulation of the 
optimization problem. Our uniqueness result is presented in 
Section III Sections IV and [V] respectively describe the pro- 
posed algorithm and the corresponding proof of convergence 
to a local optimum. Experimental results are described in 



Section VI and concluding remarks are given in Section VII 



II. Problem formulation 

A. Preliminaries 

Assume vectors y,; G M™' ,i = 1, . . . , N are observed, such 
that 



of disjoint subspaces |23 1, pO) , with the number of subspaces 
being small. This motivates the one-block-sparsity assumption 



Yi = AiXi = Ai(Ds,; + ei) 



(3) 
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where G M" is an unknown signal. A,; G M™' ^ " is a known 
sensing matrix, D e M"^'' is an unknown dictionary, G W 
is an unknown sparse vector of coefficients such that — 
Dsi +6;, with small residual ||ei||2- We focus on the noiseless 
case, although our analysis can be extended straightforwardly 
to include observation noise, following |4|. Given we would 
like to estimate x^ by finding D and s^. Thus, we are interested 
in solving 



1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 S2 S3 S4 



N 

min IIyj - AjDsj||2 , 

D,Si....,SjY ^ — ^ 
i—1 

Si. Si one-block sparse 



(4) 



with our estimates denoted D, s^i, . . . , s^r. 

It is assumed that each x^ lives in a subspace specified by a 
subset of the columns of D, so that there exists a permutation 
of the columns such that x^ = Ds; and the coefficient 
vector Si is one-block sparse, as defined below. It is also 
assumed that D is composed of blocks (subsets of columns) 
corresponding to the blocks of s^, with the cardinality of 
the blocks summing to r. The cardinality and composition 
of each block are unknown, although we fix a maximum 
cardinality as in BSDO. The atoms in each block are assumed 
orthonormal, in order to avoid scaling indeterminacies and 
also to obviate concerns with sub-coherence (see p2| for a 
discussion). The expression in Q, particularly the £2 norm, is 
equivalent to a maximum-likelihood solution for the unknown 
model parameters, assuming that the components of are i.i.d. 
Gaussian and the model deviation is negligible. 
Definition 1 (Block sparsity and one-block sparsity). Let 
the dictionary D G M"^'' have a block structure such that 
D = [D[1]---D[L]], where each T>[£],£ G {!,..., L} is 
a unique subset of the columns in D, and the columns of 
D[£] are assumed orthonormal for all £. Following (TTl, a 
signal Si is A'-block-sparse under dictionary D if it admits 
a corresponding block structure s^ — [si[l]'^ ■■■ Si[L]'^]'^ 
and if s^ has zero entries everywhere except at a subset 
Si[£i], . . . , Si[£j^] of size K L, corresponding to blocks 
D[£i], . . . , 'D[£k] of D. Each dictionary block !)[£] G R""""^ 
and each coefficient block Si[£] G M''*. In general, the block 
size kg is not known a priori, although it is common to define 
a maximum size fcmax- 

We are specifically interested in one-block sparsity, where 
K = 1. An interpretation of one-block sparsity is that 
it corresponds to signals that live in a union of L linear 
subspaces, with each subspace spanned by the columns of one 
block D[£]. If block 'D[£] has k( columns, then its subspace 
is of dimension fc^. Figure [T] illustrates the concept of one- 
block sparsity. The index set of all signals which use block £ 
is uji = {i : Si[£] 7^ 0}. Note that for one-block-sparse signals 
there can be no overlap between uj^^ and a;£, for £1 ^ £2- 

The optimization problem Q can be decomposed as 

^ ||y,-A,D[^]s,[£]||^, (5) 

s.t. T>[£f'D[£] ^ I, 

for all £, where Si[£] G M''* is the ^-th block of s^. The solution 
to problem (HI is only unique up to the following equivalence 



mm 

W(.,D[£],si[£],...,sjx 



D[l] D[2] D[3] D[4] 



o,={ll a)2=i3,4) t03={2} m4=n 



Si[l] 



Si[3] 



Si[4] 



Fig. 1. Illustration of the concept of one-block sparsity. The figure 
represents a dictionary with r = 16 columns, arranged consecutively in blocks 
D[l], . . . , D[4], all of size four Also represented is a coefficient matrix with 
columns si,S2, S3 and S4. Non-zero coefficient values are colored, while 
zero-valued coefficient values are blank. All signals are one-block sparse, as 
they only use one block each. Note that there exists no overlap between index 
sets uj\,u)2, ^^3 and a;4. 



class. 

Definition 2 (Equivalence class for D^si 
a solution (D, s^i, . . . , s^at) to with D = 



and Sj ^ [si[l] ' ' ' 
£'i, . . . ^ £'^ of the indices 1. 



TlT 



. . , Sat). Given 
D[1]...D[L]] 
I — 1, . . . ,N, any permutation 
L corresponds to an equivalent 



solution (D', s'l, . . . , s'^) of the form D' = [T)[£[] • • • D[^^ 



and s', = [sS[] 



]TlT 



= 1, 



,N. Addi- 



tionally, any sequence of invertible matrices Tg G 
£ = 1, . . . ,L, where kg is the size of the £-th block, also 
corresponds to an equivalent solution of the form D' = 
[D'[1]...D'[L]] and = [(s,[l]')^ ••• iML]'])T, * = 
1,...,N, with T>[£]' = 'D[£]T(, and s^^]' = Tj^Si[£]. 

Our first goal is to develop conditions under which (|4]) 
admits a unique solution. To this end we note that the index 
sets uj( are unknown, and therefore estimates ujg must be 
found. This is due to the fact that, a priori, we do not know the 
correct assignments of columns to blocks. In our uniqueness 
analysis, we follow the same strategy as 1 1 1 and assume that all 
possible index sets can be tested. This is admittedly impractical 
in general, and it is done only for the purpose of constructing 
the uniqueness proof The algorithm proposed in Section IV 
provides estimates of lu( without exhaustive search. 

For a given utg, (j5]l is an instance of the weighted orthonor- 
mal Procrustes problem (WOPP), which in general requires 
iterative optimization without guarantee of convergence to a 
global optimum |21 1. In the following, we establish conditions 
under which uniqueness can be guaranteed. To this end we 
show that (|5]l can be written as a matrix-completion problem, 
and then rely on known results developed in that context. 

B. Formulation as a matrix completion problem 

To reformulate (jsjl, define X^, G K"^!"^*! as the matrix 
whose columns are the signals x, that use block £, i.e., with 
i G LOi, and similarly define S^^J.^] G as the matrix 

whose columns are the coefficient blocks sM for i G uJi. 
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Since D[^] has only columns and S^Ji'] has ki rows, it 
is clear that Xi^^ ~ D[£] S^^Jf] has rank at most kg. In fact, 
unless the null space of D[^] intersects the range of 8;^^, the 
rank is exactly ki. This should not happen in general, barring 
degeneracies that are precluded by the conditions assumed 
in our results. This suggests the strategy of treating each 
subproblem (|5]), assuming oji is known and hence the subspace 
selection has been determined (in our method we address 
estimation of oji), as a low-rank matrix completion problem, 
as we illustrate next. 

Define A e M*^^" with M > n, as the matrix constructed 
by taking the union of the unique rows of all sensing matrices 
Ai, for i ^ LUi (in the interest of notation brevity, we omit the 
dependence of A on £). For example, in the aforementioned 
inpainting problem, for which each A; is defined by selecting 
rows at random from the n X 12 identity matrix, denoted Inxn^ 
we have A equal to a subset of I„xn and in the limit, given 
enough A^, A — I„xn (up to row permutation). However, 
the discussion below considers more-general A^, for example 
composed in terms of draws from a subgaussian distribution. 
Let e M^'^><l"«l be defined as A X^,. When 

performing measurements, we do not observe all elements of 
Yi^/, for each column i e uji, we only have access to the 
entries selected by A^, these corresponding to the matching 
observed vector y^. This is illustrated in Figure [2j which also 
shows a pictorial comparison with the cases of fully measured 
signals (DL) and a single measurement matrix A (standard 
blind CS). 

Denoting the locations of the observed entries of as 

n = {{u,v) : Y^f{u,v) is observed}, (6) 
the observation model can be written as 

Pn{Y^,) = PniAX^J - Pn(AD[^]S„J£]), (7) 

where Po(-) is the operator that extracts the values of its 
argument at locations indexed by ft. Each subproblem (j5]l can 
then be reformulated as 



mm 

s.t. 



Pn(Y^J-Pn(AD[^]S, 



BliVDl 



(8) 



where || • is the Frobenius norm. If X^^ has rank ki (the 
block size), assuming A has rank greater or equal to ki and 
does not reduce the rank of X^^^ (since ki is generally small 
and A is assumed to have rank n, these conditions for A are 
anticipated to be typical), we have that Y^^^ is also of rank 
ki. This establishes the fact that, when X^^ has low rank, so 
does Y(jj (matrix completion results are most useful when 
the rank is low, although they hold for any value of the rank). 
Each subproblem (|8]l can then be solved by first completing the 
matrix Y^^^ and then obtaining X^^ and, subsequently, D[£] 
and StjJ^]. Note that, although this is a two-step process, if 
A has rank n then it can be inverted to find a unique mapping 
from any estimated complete matrix Y^^ to a corresponding 
estimated X^^,. As long as X^^, can be correctly estimated, 
the solution to ([8]l can then be found by a singular value 
decomposition (SVD) of X^^,. This is due to the fact that, 
for rank(D[£]S(^j) — ki, SVD minimizes the Frobenius norm 



of the residual X^, - D[^]S„, [i] w.rt. D[£] and S^, [£], by the 
Eckart- Young theorem fTTl. 

Following matrix completion theory ||5], ||7], p4) , p5) , if 
Y^j is truly of low rank (and additional technical conditions 
are met), it can be correctly completed with high probability 
by solving the convex program 



rmninuze 



(9) 



s.t. Po(Y„J = Pn(Y„,), 



where || • ||* is the nuclear norm, which is defined for a generic 
matrix Z of rank k as the sum of its singular values, i.e.. 



1Z|U=^7.(Z) 



(10) 



with 7i(Z) indicating the i-th singular value of Z. Importantly, 
problem (|9]) is not equivalent to ([8]l, since a solution of (|8]l may 
not be a solution of (|9|. This is a reflection of the fact that 



there may exist a high-rank Yi^^, which solves ([8|l but has 
high nuclear norm. However, the opposite holds: a solution of 
(|9]l is always a solution of ([8]l, and therefore of (|5]l. Moreover, 
the solution of (|9]) will have low rank and therefore produce 
D[£] with the smallest possible block size ki, which is clearly 
desirable. Thus, the nuclear norm formulation yields those 
solutions of the original problem ([8]l that are useful. 

In the following section, we state conditions on the number 
of observed entries and on A for successful recovery of Y^^, 
X^^ and, subsequently, of D[^] and Si[£], . . . ,sjv[^], assuming 
exhaustive search over uji, by exploiting the connection be- 
tween (jSj), (|8]l and (j9]l. In Section IV we propose an algorithm 
that does not require exhaustive search and is guaranteed 
to converge to a local minimum under mild conditions here 
established. The notation used in this paper is summarized in 
Table 1 below, for ease of reference. 

TABLE I 
Quick notation guide 





Observed vector i 


rrii 


Number of observed coordinates of vector 


Xi e M" 


Unknown signal i 


s, e R"" 


Sparse coefficient vector i 


D e 9."^'^ 


Dictionary 


D[^] e R"^*^ 


Dictionary block i, i.e. the ^-th subset of the 




columns of D 


ki 


Number of atoms in block £ 


uje = {i: Si[e] 0} 


Set of indexes of the signals that use block £ 


of the dictionary D with coefficients 




Number of signals that use block £ 




Subset of the signals associated with block £ 


e R'-xi"«i 


Subset of the coefficient vectors 




associated with block £ 




Block £ of Si, i.e. the £-th subset of the rows, 


corresponding to D[^] 




Block ( of sparse coefficient vector 




(^-th subset of the rows) 


Ai e R'^'X" 


Sensing matrix for vector i 


A e R*f'<" 


Union of the rows of multiple sensing matrices 


g]RMx|^fl 


Incompletely observed data matrix associated 


with block £ 


n 


Observed locations of Y^j^ 


(El 


Kronecker product 
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Fig. 2. Illustration of the observed elements of Y^j^ with multiple measurement matrices (bottom), compared with the DL (top left) and standard blind CS 
(top right) cases. The colored squares represent observed elements, while blank squares represent unobserved elements of Y^j^ . Here we assume that Y^^^ 
has M = n rows. 



III. Uniqueness result 

Our results rely on the concepts of spark and coherence, 
which we now define. The spark of a matrix is the smallest 
number of columns that are linearly dependent; the number of 
linearly independent columns is the rank. Additionally, let U 
be a subspace of M" with dimension k and spanned by vectors 
{z„}t(=i .. .fc. The coherence ofU vis-d-vis the standard basis 
{e„}i,=i_...,„ is defined as p4) 

m(Z^) = ^max||z^e,||2. (11) 

This quantity is very similar to the coherence of a matrix, as 
defined in |28|. Our results also build on the following DL 
uniqueness conditions. 

Definition 3 (Uniqueness conditions for DL |1|). 
. Support: llsillo - /c < ^,Vi 

• Ricliness: There exist at least fc + 1 signals for every 
possible combination of k atoms from D. With regular 
sparsity level k , this amounts to at least (fc + 1)(^) 
signals. For one-block sparsity with L blocks of size 
k£,£ = 1, . . . , L, however, we need + 1) signals, 
which is typically a far smaller number 

• Non-degeneracy: Given fc + 1 signals from the same 
combination of fc atoms, their rank is exactly fc (general 
position within the subspace). Similarly, any fc + 1 signals 
from different combinations must have rank fc + 1. 

These conditions apply to the case of fully measured signals 
and constitute a limiting case, since our problem reduces to 
DL when all are equal to the identity matrix. Even in 
this limiting case, however, the one-block sparsity condition 
has the advantage of greatly reducing the number of possible 
subspaces and therefore the required number of signals, as 
stated in the richness property definition above. Note that 



we do not require prior knowledge of the block size ki, as 
explained below. We now present our main result. 

Tlieorem 1 (Uniqueness conditions for blind CS witli 
multiple measurement matrices). Let e M™% with 
i — 1, . . . , N, be a set of observed vectors, obtained through 
projections of unknown signals G M" so that each 
Yi = AiXi and A^ e j^mixn ^ known sensing matrix. 
Furthermore, let — Ds^ where Si G MJ" is an unknown 
vector of coefficients obeying one-block sparsity according 
to Definition 1 and D G M"^*" is an unknown dictionary 
having a block structure such that D = [D[l] • • • D[L]], with 
D[£]^D[£] — 1. Let kg be the number of columns of block D[£] 
. In addition, let ui be the index set of the vectors yiifor which 
the corresponding Si have block Si[£] active, i.e., nonzero, and 
let \uji\ denote the number of such vectors. Define A G M*^^" 
as the union of the unique rows of all Ai for all i G uJe, so 
that Yi^j G R^^l"*^! with M > n has columns given by Axi, 
for i G UJi, and only a subset of the elements in Yi^, are 
observed. Define Mu = min(A/, \uJi\), M2i = max(M, \uji\) 
and let ~ USV"'" be the singular value decomposition of 
Ytj^. Define /i^ — max(/^^, /^o). where /ii is an upper bound 
on the absolute value of the entries of UV"^ {MuM2i) /kg 
and po is an upper bound on the coherence of the row and 
column spaces ofY^^g.. 

Then, by solving problem ^ one can exactly recover all the 
blocks ofT) and the coefficient vectors Si up to the equivalence 
class presented in Definition 2, with probability at least 1 — 
61og(M2)(Mi + M2)2-2/5 - M^'^"^ for some /3 > 1, if the 
following conditions are met for each £ € {1, . . . , L}. 

(i) For all i ^ uJi, ||s,;||o = fc^ < 

(ii) \uje\ > kg. 

(iii) The vectors G LUi, are non-degenerate, i.e., any 
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subset of k < ki vectors span a subspace of rank at 
least k. 

(iv) 32iiiki{Mi£ + M2i)/3\og{2M2i) entries of the matrix 
Ytj^ are observed uniformly at random. The total num- 
ber of observed entries is, thus, X^fci (Mi£ + 
M2,)/31og(2M2«). 

Proof of Theorem 1: Condition (i)-(iii) are analogous 
to the DL uniqueness conditions in Definition 3, which are 
proven in |1|, and are always required even if all are fully 
measured. Condition (i) is adapted to our setting by imposing 
the spark restriction on AD; this condition ensures that no 
two dictionary blocks are linearly dependent, and that the 
sensing matrices are sufficiently incoherent with respect to 
the dictionary. This is the case, with high probabilty, when 
A obeys a random subgaussian or orthobasis construction |j6|, 
0. 

Condition (ii) stems from the fact that any fc^-dimensional 
hyperplane is uniquely determined by ki + 1 vectors, and since 
a subspace will always contain the origin, only k( additional 
vectors are required. We write \uJe\ > k( instead of \ujg\ > 
ki because, in the setting of one-block sparsity, there exist 
N = X^fci vectors from a union of L different subspaces, 
and we need to be able to assign vectors to their respective 
subspaces. 

If condition (iii) holds, which precludes spurious colineari- 
ties or coplanarities, then the assignment can be done by the 
(admittedly impractical) procedure of testing the rank of all 
possible matrices constructed by concatenating subsets 

of fcf + 1 column vectors, as assumed in (l]. If the rank of 
such a matrix is ki, then all its column vectors belong to 
the same subspace. Otherwise, the rank will be kg + 1. It is 
not necessary to know kg a priori, provided that we test all 
possible values fc^ = 1, . . . , n — 1 in the following way. Start 
by setting kg ~ 1 and testing all possible subsets containing 
kg + 1 = 2 signals. If any such subset has rank one, then 
we have found a (singleton) block. Any other possibility is 
precluded by the non-degeneracy condition (iii). If multiple 
subsets have rank one, then test the subspace angles in order 
to determine how many distinct singleton blocks exist. Record 
the corresponding blocks and signal assignments and remove 
the signals involved from further consideration (due to one- 
block sparsity, each signal belongs to only one block). Iterate 
this procedure until either: (a) we have exhausted the signals; 
(b) all r atoms have been clustered into blocks, or (c) we have 
reached kg = n. 

We now address Condition (iv). We have jw^j > kg vectors 
from subspace £ and we assume that A has rank n and is, 
therefore, invertible. As discussed in Section |ll] Y^^^ is an 
incomplete low-rank matrix. Under conditions studied in p4) , 
the convex program (|9| will recover the complete Y^^^ with 
high probability. Therefore, by showing uniqueness of the 
complete low-rank matrix Y^^, we can show the uniqueness 
of the corresponding subspace spanned by D[£], which can be 
found by first solving the optimization (|9| and then taking the 
estimate Y^^^ and performing the SVD of A+Y^^^. 

We now invoke Theorem 1.1 in | |24l which states that, 
under the conditions and with the probability specified in our 



theorem, the minimizer of problem (|9]) is unique and equal to 
the true Y^^^. This concludes the proof. ■ 

Note that the aforementioned theorem from p4) , like pre- 
ceding results I?], is based on having at least one entry for 
each row and for each column; this is already accounted for in 
the derivation of the bound on and associated probability of 
successful recovery under a uniformly-at-random pattern for 
the missing data. If there is a fixed A; = A (with less than 
n rows), or if there is any row or column entirely missing 
from il, then there is no recovery guarantee. This is the 
case of standard blind CS, where there is a fixed A which 
is the same for all signals, and thus there are entire rows 
without observations, as illustrated in Figure |2] Hence, blind 
CS requires additional constraints on D, as explained in fTT^. 
Using multiple A^ allows us to avoid those constraints. Also, 
even when M > n, we are still subsampling because we do 
not observe all entries of Y^^ . 

The limiting case when all blocks are singletons, i.e., kg — 
1 for all I, coupled with the one-block sparsity assumption, 
means that each patch belongs to a straight line in M", which 
is a very strong restriction. On the other hand, if all kg — 1 but 
the one-block sparsity assumption is removed and there are K 
active blocks (singleton atoms when kg — 1), then we revert 
to standard sparsity; Theorem 1 still applies, but we would 
need to be test (j^) possible combinations of atoms, which 
may be an extremely large number In fact, the conditions 
in Theorem 1, with standard sparsity and additionally with 
all A = Inxn, reduce to those in DL. One-block sparsity 
makes Theorem 1 more appealing, since then there are only 
L possible combinations instead of (]^). 

Under Theorem 1, recovery is contingent on having the cor- 
rect clustering given by ujg. In the above-presented analysis, it 
is assumed that we have the computational ability to exhaus- 
tively search over all index sets ojg. Under the conditions of 
Theorem 1, this search will achieve the correct assignment of 
signals to subspaces (clustering), and we need only concern 
ourselves with vectors that all come from the same subspace. 
This is done only for the purpose of constructing a proof, 
following the same strategy as in Also, like the DL 
uniqueness result in ||T|, Theorem 1 is overly pessimistic in 
the required number of measurements. Specifically, for image 
inpainting applications, where Yi^^, typically has far more 
columns then rows, the prescribed number of measurements 
can be excessive (often orders of magnitude larger than the 
total number of pixels). This reflects a limitation of the current 
state-of-the-art in matrix completion theory, and we stress 
that other uniqueness results, such as fTj for the simpler DL 
setting, suffer from the same problem. Nevertheless, Theorem 
1 provides peace of mind by guaranteeing the existence of 
a unique optimal solution given enough measurements. It 
remains necessary to address the practical issue of finding a 
solution with reasonable computational effort, from realistic 
amounts of unclustered data. 

Toward that end, in the next section we propose an algorithm 
that can estimate the clustering and each of the subspaces 
without combinatorial search, although it is not guaranteed 
to reach the globally optimum solution. The algorithm uses 
alternating least squares, similarly to BSDO, K-SVD and 
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Initiiliie D"" 

and ( = 1 



Estimale block structure 
with SAC/BOHP 
given D''"'' 



Estimate block 
ttssignments i^t.V^ 




are fixed (as estimated in the previous SAC step), then we 
need to solve, for each £, 



D\£],S 



(12) 



s.t. T>[ifT>[£] = I. 



The optimization alternates between D[£] and S^^Jf]. For 
now we will set the constraint D[i?]-^D[i'] = I aside and 
focus on the unconstrained problem; we will return to the 
constraint later. A description of the two main algorithm steps 
is presented below, followed by the derivation of conditions 
for convergence. 




Fig. 3. Block diagram of the overall dictionary learning procedure. The 
analysis in this paper focuses on the shaded block, where we plug our 
Algorithm 1 instead of the BK-SVD step in [26]. 



MOD. It is computationally efficient and enjoyes local conver- 
gence guarantees given much fewer measurements than those 
prescribed by Theorem 1. In Section [V] we show convergence 
of the algorithm to a local optimum, as long as specified 
algorithm-specific conditions hold. 

IV. Algorithm 

Our algorithm expands on the iterative procedure described 
in p6) . This procedure alternates between two major steps: 
(i) inferring the block structure and coefficients using sparse 
agglomerative clustering (SAC) | j26) and block-orthogonal 
matching pursuit (BOMP) |12|, and (ii) updating the dictio- 
nary blocks using block-K-SVD (BK-SVD) |26|. The latter 
step assumes fully measured data. Briefly, SAC progressively 
merges pairs of blocks £i , £2 for which the index sets uig-^ and 
Wf.-, are most similar, up to an upper limit (fc^ax) on the block 
size. BOMP sequentially selects the blocks that best match the 
observed signals and can be viewed as a generalization of 
OMP \l9j, |j22) for the block-sparse case. Similarly, BK-SVD 
is an extension of K-SVD for block-sparsifying dictionaries. 
A more detailed explanation of these methods may be found 
in | |26| and the references therein. 

We follow the procedure in J26) , but replace the BK- 
SVD step by our Algorithm 1 in order to take compressive 
measurements into account, as the block diagram in Figure [3] 
illustrates. We employ alternating least-squares minimization 
to find D and S in a manner similar to existing methods such 
as K-SVD or MOD but adapted to the CS setting. 

Recall that we are solving the optimization problem (j5]l. 
Assuming the current estimate of uji and the block structure 



A. Dictionary update 

Holding Si fixed, we can analytically find D[£] using 
properties of the Kronecker product and the vectorization 
operator, denoted as vec( ). This is similar to MOD | [T8) in 
that the entire dictionary block is updated at once. For a 
generic matrix Z, vec(Z) is the column vector constructed 
by vertically stacking the columns of Z. For simplicity of 
presentation, and without loss of generality, assume that all 
rrii = TO. Let Y^^^ denote the matrix whose columns are the 
vectors y^, so that we have |vec(Y(jj)| — m\uJe\. We rewrite 
the unconstrained problem over 'D[£] as 

vec(Y„J-Bvec(D[€]) 



mm 

DM 



where B is defined as 



(13) 



B 



.[£f(E>A, 



(14) 



with (g5 denoting the Kronecker product. This is a least-squares 
optimization problem for a standard system of linear equations. 
The solution 



vec(D[£]) = B+vec(Y^J 



(15) 



is well-defined and unique (recall that we are holding the Sj, 
and therefore B, fixed) as long as B has rank equal to k^n, 
the number of unknowns in vec(D[£]). 

B. Coefficient update 

After obtaining D[^] for the current iteration, the coefficient 
vectors Si[£] are found analytically by least squares as 

s,[^] = (D^[£]Af A,D[£])-i(D^[^]Afx,). (16) 

The factor (D^ [£]Af AiT)[£])^^ is invertible under conditions 
speficied in our convergence result below. 

C. Orthogonality constraint 

The orthogonality condition D[£]^D[^] = I is currently 
imposed a posteriori by performing a SVD decomposition of 
D[£] and applying the resulting unitary rotation matrix to the 
Si[£]. This is equivalent to Gram-Schmidt orthogonalization, 
and will not change the locations of the nonzero elements 
of the Si, although the blocks D[^] will not have orthogonal 
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columns during the iterative procedure. This procedure has 
worked well in practice, and we therefore do not pursue a 
more direct constrained minimization. 

In addition to the proposed Algorithm 1, we have also im- 
plemented and tested an alternative method where, instead of 
alternating least squares, we perform convex optimization of D 
and S jointly (given the estimate of uji provided by the current 
SAC step) by completing matrix Y^^,, which is closer in spirit 
to the ideas underlying the uniqueness conditions in Theorem 
1. However, existing general-purpose convex optimization 
packages do not readily scale up to matrices containing 
millions of entries as in our inpainting experiments, as noted 
in |[3|. Using mixed nuclear/Frobenius norm minimization via 
the singular value thresholding (SVT) approach proposed in 
||3], we have attained reconstruction performance comparable 
to our Algorithm 1, but at much higher computational cost. 
We have also found that successful convergence with this 
alternative SVT-based approach depends on careful tuning 
of step-size and regularization parameters. In contrast, our 
alternating least squares method does not have any tuning 
parameters (other than the maximum block size fcmax) and 
is computationally far less demanding. 

Algorithm 1 - Joint estimation of D[^] and S„J£] via 
alternating minimization 

Initialization: Use the estimates of cj^, S^j^, and the block 
structure of D from the preceding SAC/BOMP step 



for all £ do 



Form the matrix B 



- Update block £ of the dictionary by computing 
vec(D[£]) =B+vec(Y„,) 

- Orthogonalize T>[1] 

- Update the coefficients: = 
end for 



V. Convergence result 

The following result establishes convergence conditions for 
our proposed alternating least-squares algorithm. 

Proposition 1 (Convergence of Algorithm 1). Let G M'"', 

with i = l,...,N be a set of vectors that satisfy the 
conditions stated in Theorem 1. Let Ai € t,e sensing 

matrices and £ M" signals such that = A^x^. Assume 
each signal is of the form = Ds^, with D a dictionary 
comprised of blocks D[l], . . . , D[L], where each block has 
kg atoms, and Si G K' a one-block-sparse vector Let lo£ 
be the index set of vectors x^ from dictionary block £, and 
let Y(^^ G ^^■t>^\^e\ Q„ incomplete data matrix such 
that PaO^uii) = -Pn(AD[£]S„J£]), with VL the locations of 
observed entries, and A G K^^^" the union of the rows of the 
Ai. Then, the alternating minimization procedure described 



in Section will converge to a local minimum of Q, with 
probability specified below, if the following conditions hold: 



(i) For all £ and all i G t^e, 11 s 



er(AD) 



(ii) For the subset of vectors associated with each block £, 
we have \uj{\ > n. 

(iii) The total number of observed values, is 0(kin) if the 
elements of all Ai are i.i.d Gaussian, and 0{kin\ogn) 
if the rows of Ai are selected uniformly at random from 
A and rank(A) — n. 

(iv) Each vector y^ has rrii > kg measured values. 

The probability of convergence, for each block £, is equal to 



one with i.i.d. Gaussian Ai, and equal to 1 



for 



random selection of rows from A, where /3 is the constant 
such that \^\ — /3nlogn. 

Proof of Proposition 1: Since condition (i) is the same as 
in Theorem 1, we focus on conditions (ii)-(iv). As mentioned 



above, the analytic solution (15 1 is well-defined and unique 



as long as B has rank equal to kin. Examining (13i and 

are non-degenerate {i.e., 
the subset of nonzero rows of 



(14 1 we can see that, if the s 

•=k 



rank(S^ 



with St, 



Si^i), then the rank condition is equivalent to having the 
number of observed elements |vec(YtjJ| — \Q\ > kgn and 
also 



(17) 



that is, the vertical concatenation of all A^, having rank n. This 
is due to the fact that rank(si(8) A^) = rank(sj) xrank(Ai), Vi. 
The number of observed values needed to ensure this rank 
condition depends on the probabilistic mechanism that gener- 
ates the A^. We will analyze two such mechanisms: random 
Gaussian and random subset of projections. Without loss of 
generality, assume all rrii — m (we can always treat m as a 
lower limit on the nii). 

Random Gaussian: We assume that all A^ have elements 
independently drawn from a Gaussian distribution. Then, with 
probability one, any subset of size n of the mjoj^ | rows of F 
are linearly independent. Therefore, the condition m\uje\ > n 
ensures rank(r) = n; since we need mjcLifl > kgn (and 
therefore jw^j > n) in order to have enough observed entries 
anyway, the latter condition ensures that the rank of T is 
adequate. Note that this reasoning assumes that the A, do 
not share rows; if they do share rows, e.g., in the case when 
there is a finite pool of random-Gaussian projections and the 
rows of Ai are subsets from the pool, then we must use the 
following result instead. 

Random subset of projections: We analyze the case for 
which the rows of each sensing matrix A^ are randomly 
drawn from a pool of linearly independent rows, the union of 
which constitutes A. The typical example of this situation is 
when the rows of A form a random orthobasis, although here 
orhonormality is not required (linear independence suffices). 
This includes the case when A is the identity matrix, which is 
most relevant to the inpainting/interpolation problem. Note that 
the problem can be treated as an instance of the classic Coupon 
Collector problem (see, e.g., p5)). There are m|wf | rows (the 
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number of rounds our "collector" draws a randomly chosen 
row, or "coupon") that have to span a space of n dimensions 
(the different types of "coupons" we need to collect). Denote 



as Z. 



m\uj£\ 



the event that no coupons of type i have been drawn 



after m|wf| rounds. It is easy to see that 



P\Z' 



m\LUi\ 



1 

1 — 

n 



(18) 



where the inequality comes from our ignoring, for simplicity, 
the fact that the rows are drawn without replacement within 
each sensing matrix (stated otherwise, there are no repeated 
rows within each A^). Then, by the union bound. 



P 



u 



< n 1 - 



1 



m\uJi I 



(19) 



in Theorem 1 for uniqueness. However, this is not surpris- 
ing since Theorem 1 is based on matrix completion theory, 
which establishes slightly pessimistic sufficient conditions for 
strong unique global recovery guarantees, while Proposition 
1 only establishes convergence of our algorithm to a local 
minimum. It is also interesting that, in the case of random 
subsets of projections, the 0{kin\ogn) requirement matches 
the information-theoretic limit established in ||7j for matrix 
completion. This similarity is due to the fact that, as in ||7| 
and other work, we make use of the Coupon Collector model. 
Regarding the lower bound on m^, since we typically expect 
the block size to be small, the condition nii > kg is relatively 
mild and should only be of concern for extremely small 
measurement fractions. 

VI. Experimental results 



The left-hand side is precisely the probability that T is not 
rank n. To look at how this bound scales relative to n, we can 
apply the inequality 1 — x < e^^ "i-*-"'" ''i 
e Then, consider mlw^l : 



The algorithm proposed in Section IV is validated by 



^ to obtain (l- ;^)"""" < 
/3n\ogn, with /? constant, 
and define T to be the minimum value of m|ajf| before we 
achieve full rank, so that P[T > m\uj£\] ~ P[T > /Snlogn] ~ 



P 



2\uJt\ 



. We have 



P[T > Pnlogn] < n 1 



< ne 



ne 



- [in log 



(20) 



Therefore, we need O(nlogn) observed values in order to 
achieve rank n with arbitrarily high probability. Hence, we 
need 0{nki\ogn) for solving (13i for each block. Although 
we do not impose the orthogonality contraint directly in the 



minimization, any solution of ( 13 1 can be orthonormalized and 



thus become a solution of \\2) without changing the subspace 
defined by D[f]. 

Concerning the coefficient update expression ( [T6) l, 
the issue of invertibility comes up due to the term 
(D^[€]Af AiD[^])"^ Note that the matrix being inverted has 
size hi X fcf. The rank of AfAi is equal to rrii, the number 
of rows of Ai. For invertibility of the above expression, we 
need to have kg < rrii, so that there is no loss of rank. This 
means that our method requires a lower bound on rrii. 

We conclude the proof by invoking, as in |[T], the fact 
that all of the steps of this alternating minimization method 
are optimal under the above assumptions, and hence cannot 
increase the value of the objective function. Therefore, since 
the objective function is bounded from below by zero, the 
algorithm will converge. ■ 

We have proven convergence in the objective function, 
which is a weak form of convergence in the sense that the 
objective function will attain a minimum but the estimates 
might not reach a stopping point. However, convergence in 
the objective function is still a useful result, and it is the same 
guarantee as K-SVD and MOD. In practice, the above result 
ensures that the required number of signals scales linearly 
with kg and also linearly (up to a log factor in the case 
of measurements using an orthobasis ensemble) with signal 
dimensionality n. These are more favorable bounds than those 



inpainting the well-known "Barbara" (512 x 512 pixels) and 
"house" (256 x 256 pixels) images, for varying percentages 
of observed pixels (these two examples are representative of 
many others we have considered). The images are processed 
in 8 X 8 overlapping patches, treated as vectors of dimension 
n = 64. In Figures |4] and |5] we present the original images and 
the test versions, with 25%, 50% and 75% of the pixel values 
r^/Qbserye0^(selected uniformly at random). In all experiments, 
the total number of dictionary elements is set to r = 256. 
Figures |6] and |7] show the inpainting results achieved by our 
algorithm from 50% observed pixels, using maximum block 
sizes fcmax — 4 and fcmax = 8. The peak signal-to-noise ratios 
(PSNR) for this case are shown in Table [11] 

TABLE II 

Peak signal-to-noise ratios (PSNR) in inpainting tasks with 

50% OBSERVED PIXELS 



"Barbara" 
"House" 



27.68 dB 
3L80 dB 



27.93 dB 
32.03 dB 



The PSNR values for varying percentages of observed pixels 
are plotted in Figure [Sj for fcmax equal to four and eight. 
Results are averaged over ten runs with different random lo- 
cations for the missing pixels, with error bars showing the one 
standard deviation interval. These experiments are intended to 
validate our method, rather than claiming outperformance of 
the state-of-the-art in image inpainting and interpolation. Our 
performance is comparable to that of the algorithm described 
in IjSlJ, which we have used on the same images. However, 
our model and algorithm are significantly simpler and, unlike 
|31|, our algorithm has convergence guarantees. Also, while 
the PSNR achieved in our experiments is lower than in the 
state-of-the-art approach in |30|, the results are not directly 
comparable due to the fact that additional structure is assumed 
in (3Q\, including fast decay of the singular values within each 
block (which is analogous to internal sparsity). 

Like other authors (e.g., JTSJ), we have observed a phase 
transition phenomenon where the reconstruction performance 
falls sharply for measurement percentages under a rank- 
dependent threshold (near 40% observed pixels in our case). 
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This is shown in Figure |9j which was obtained using as input 
subsets of pixels from a union-of-subspaces approximation 
of the "Barbara" image for block sizes four and eight, thus 
matching our model exactly. We plot the empirical frequency 
of successful reconstructions over ten realizations of the miss- 
ing pixel locations, for varying measurement percentages. The 
reconstruction is deemed successful when the PSNR exceeds 
40 dB. 

In addition, the learned dictionaries for fcmax = 4 and 
fcmax — 8 are depicted in Figures 10 and 11 It is possible 



to observe that the atoms within each dictionary block are 
more similar to each other than those in different blocks. This 
is expected and desired clustering behavior, as one would like 
the blocks to reflect different image properties. This behavior 



is further illustrated in Figure 12 where the index of the 
block used by each patch is represented by a corresponding 
color (to avoid clutter, this is shown for the most frequently 
used dictionary blocks). It is apparent that patches with similar 
texture tend to share blocks. 

In all cases tested, convergence occurs after a maximum 
of ten iterations. A non-optimized MATLAB implementation 
of Algorithm 1, including the SAC/BOMP steps, requires 
approximately 12 hours for inpainting the full 512 x 512 
"Barbara" image (worst case) on a computer with CPU clock 
frequency of 2.52 GHz. This computation time is similar to the 
BSDO algorithm described in |26|. We employ overlapping 
patches in order to avoid block artifacts, leading to a total 
of A*" = 255,025 vectors being processed for the "Barbara" 
image, with iV = 62,001 for the "house" image. Note that 
the overlapping procedure follows the same "non-overlapping 
sensing with overlapping reconstruction" strategy as p9) , | [3T) ; 
this procedure employs a distinct sensing matrix per non- 
overlapping patch, with the image reconstruction performed 
by averaging the overlapping estimated patches. 




40 60 
Percentage observed 



100 



Fig. 9. Illustration of the phase transition phenomenon, using as input subsets 
of pixels from an approximation of the "Barbara" image that exactly obeys 
the union-of-subspaces model, with block size (rank) four and eight. The 
plot shows the normalized frequency of successful reconstructions over ten 
realizations of missing pixel locations, for varying percentages of observed 
pixels. We declare a successful reconstruction when the PSNR of the estimate 
is greater than 40 dB. 



VII. Conclusion 

We have proposed a framework for simultaneous estimation 
of one-block-sparse signals and the corresponding dictionary, 
based on compressive measurements. Multiple sensing ma- 
trices are employed, which allows use of low-rank matrix 
completion results to guarantee unique recovery (up to a 
specified equivalence class) for a sufficiently large number 
of signals and measurements per signal; bounds are derived 
for the number of measurements. The assumption of one- 
block sparsity is related to the union-of-subspaces signal 
model and to the well-known mixture of factor analyzers 
(MFA) statistical framework. Existing results for the related 
problems of dictionary learning (DL) and block-sparse dictio- 
nary optimization (BSDO) are extended, due to the analysis 
of compressive measurements, and blind compressed sensing 
(blind CS) is also extended by considering a broader class of 
dictionaries. 

Additionally, a practical algorithm based on alternating 
least-squares minimization has been proposed. The algorithm 
is related to BSDO, and has been shown to converge to a 
local optimum under mild conditions. We observe encour- 
aging performance on image inpainting tasks without the 
need for parameter tuning or careful initialization. An avenue 
for further research is the treatment of observation noise, 
which constitutes a natural extension of our work, as well 
as extension of the theory beyone one-block sparsity. 
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Fig. 5. Top: original 256 x 256 "house" image. Bottom, left to right: test versions with 25%, 50% and 75% observed pixel values (the remainder are 
removed). 




Fig. 6. Inpainted 512 X 512 "Barbara" image with 50% observed pixels. The peak signal-to-noise ratio (PSNR) of this estimate is 27.93 dB. Maximum 
block size (fcmax) is eight and the total of dictionary elements is set to r = 256, divided in L = 32 blocks. Image patches have size 8x8 and are treated 
as vectors of dimension n = 64. As we employ overlapping patches, the total number of vectors h N = 255, 025. 
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Fig. 7. Inpainted 256 X 256 "House" image witli 50% observed pixels. Tlie peak signal-to-noise ratio (PSNR) of this estimate is 31.80 dB. Maximum 
block size (fcmax) is four and the total of dictionary elements is set to r = 256, divided in L = 64 blocks. Image patches have size 8x8 and are treated as 
vectors of dimension n = 64. As we employ overlapping patches, the total number of vectors is TV = 62, 001. 
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Fig. 8. Peak signal-to-noise ratio (PSNR) for inpainting the "Barbara" (left) and "house" (right) images with 25%, 50%, 75% and 100% observed pixel 
values, using maximum block size (femax) of four and eight. Results are averaged over ten realizations of the randomly missing pixel locations. The error 
bars depict one standard deviation. Using higher fcmax yields better PSNR (but also a less parsimonious model). 



(a) 



(b) 

Fig. 10. Learned dictionaries with Algorithm 1 for the "Barbara" image using fcmax = 4 (a) and fcmax = 8 (b). Each atom is shown as a 8 X 8 image. The 
atoms are arranged in a fcmax x L matrix, with all atoms in each column belonging to the same dictionary block. 
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(a) 



(b) 

Fig. 11. Learned dictionaries for the "house" image using fcmax = 4 (a) and femax = 8 (b). Each atom is shown as a 8 X 8 image. The atoms are arranged 
in a fcmax X L matrix, with all atoms in each column belonging to the same dictionary blocls. 




Fig. 12. Dictionary block usage for the "Barbara" image with fcmax = 8. The colors represent the index of the block used by each patch, for a subset 
containing the 22 most frequently used blocks, out of a total of 32. The subset is shown to the right, aligned with the indices of the colorbar, with each block 
corresponding of a row of subimages (atoms). It is apparent that patches with similar textures (e.g., the pants, portions of the scarf and table cloth) tend to 
share blocks (best viewed in color). 



